Evaluate the results of the A/B test.
recommender_system_test;product_page event,product_cart,ab_project_marketing_events.csv — calendar of marketing events for 2020.
name — the name of the marketing event;regions — regions where the advertising campaign will be conducted;start_dt — campaign start date;finish_dt — the end date of the campaign.final_ab_new_users.csv — users registered from December 7 to December 21, 2020.
user_id — user ID;first_date — registration date;region — user's region;device — the device from which the registration took place.final_ab_events.csv — actions of new users in the period from December 7, 2020 to January 4, 2021.
user_id — user ID;event_dt — date and time of purchase;event_name — event type;details — additional data about the event. For example, for purchases, purchase, this field stores the purchase price in dollars.final_ab_participants.csv — table of test participants.
user_id — user ID;ab_test — name of the test;group — the user's group.#
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from datetime import datetime, timedelta
from plotly import graph_objects as go
import folium
from folium import Map, Choropleth, Marker
from folium.plugins import MarkerCluster
from folium.features import CustomIcon
from numpy import median
import re
import os
import json
import numpy as np
from plotly.subplots import make_subplots
from scipy import stats as st
import math as mth
import scipy.stats as stats
mevents = pd.read_csv('data/ab_project_marketing_events.csv', parse_dates=['start_dt', 'finish_dt'])
new_users = pd.read_csv('data/final_ab_new_users.csv', parse_dates=['first_date'])
events = pd.read_csv('data/final_ab_events.csv', parse_dates=['event_dt'])
participants = pd.read_csv('data/final_ab_participants.csv')
#
def info(data):
print('------------- First 5 lines ------------')
display(data.sample(5))
print('------------- Data types ------------')
display(data.info())
print('------------- Gaps ------------')
for element in data.columns:
if data[element].isna().any().mean() > 0:
print(element, '-', data[element].isna().sum())
else:
print(element, '- None')
print('------------- Duplicates ------------')
if data.duplicated().sum() > 0:
print(data.duplicated().sum())
else:
print('No Duplicates');
info(mevents)
------------- First 5 lines ------------
| name | regions | start_dt | finish_dt | |
|---|---|---|---|---|
| 7 | Labor day (May 1st) Ads Campaign | EU, CIS, APAC | 2020-05-01 | 2020-05-03 |
| 2 | St. Patric's Day Promo | EU, N.America | 2020-03-17 | 2020-03-19 |
| 3 | Easter Promo | EU, CIS, APAC, N.America | 2020-04-12 | 2020-04-19 |
| 4 | 4th of July Promo | N.America | 2020-07-04 | 2020-07-11 |
| 9 | Victory Day CIS (May 9th) Event | CIS | 2020-05-09 | 2020-05-11 |
------------- Data types ------------ <class 'pandas.core.frame.DataFrame'> RangeIndex: 14 entries, 0 to 13 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name 14 non-null object 1 regions 14 non-null object 2 start_dt 14 non-null datetime64[ns] 3 finish_dt 14 non-null datetime64[ns] dtypes: datetime64[ns](2), object(2) memory usage: 576.0+ bytes
None
------------- Gaps ------------ name - None regions - None start_dt - None finish_dt - None ------------- Duplicates ------------ No Duplicates
info(new_users)
------------- First 5 lines ------------
| user_id | first_date | region | device | |
|---|---|---|---|---|
| 20686 | 8D82ADDBFE65523F | 2020-12-08 | EU | Android |
| 29672 | FD348F3ADE080914 | 2020-12-16 | EU | Android |
| 38907 | 2B5E05748037FB47 | 2020-12-17 | EU | iPhone |
| 24442 | FC8402CECBF11EC3 | 2020-12-22 | EU | Android |
| 42337 | DB214579327A26BE | 2020-12-18 | N.America | Mac |
------------- Data types ------------ <class 'pandas.core.frame.DataFrame'> RangeIndex: 61733 entries, 0 to 61732 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 user_id 61733 non-null object 1 first_date 61733 non-null datetime64[ns] 2 region 61733 non-null object 3 device 61733 non-null object dtypes: datetime64[ns](1), object(3) memory usage: 1.9+ MB
None
------------- Gaps ------------ user_id - None first_date - None region - None device - None ------------- Duplicates ------------ No Duplicates
info(events)
------------- First 5 lines ------------
| user_id | event_dt | event_name | details | |
|---|---|---|---|---|
| 86352 | 6BA6AEC3DCCFE307 | 2020-12-16 12:52:11 | product_cart | NaN |
| 119260 | DFE4947F37950182 | 2020-12-26 03:17:09 | product_cart | NaN |
| 8027 | 1FD7211E6E12C1A5 | 2020-12-11 19:00:36 | purchase | 4.99 |
| 222872 | 176E733079182354 | 2020-12-23 00:10:52 | product_page | NaN |
| 206356 | F196027E61BD9325 | 2020-12-21 17:45:27 | product_page | NaN |
------------- Data types ------------ <class 'pandas.core.frame.DataFrame'> RangeIndex: 440317 entries, 0 to 440316 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 user_id 440317 non-null object 1 event_dt 440317 non-null datetime64[ns] 2 event_name 440317 non-null object 3 details 62740 non-null float64 dtypes: datetime64[ns](1), float64(1), object(2) memory usage: 13.4+ MB
None
------------- Gaps ------------ user_id - None event_dt - None event_name - None details - 377577 ------------- Duplicates ------------ No Duplicates
info(participants)
------------- First 5 lines ------------
| user_id | group | ab_test | |
|---|---|---|---|
| 1212 | 235E6B4478EB2D3D | A | recommender_system_test |
| 9356 | 70FE64008F8E2E53 | A | interface_eu_test |
| 6518 | E471AC551D5DC731 | A | recommender_system_test |
| 18252 | CA6F4DAED160E5B1 | A | interface_eu_test |
| 9965 | 530D9E62F6CF3F3F | B | interface_eu_test |
------------- Data types ------------ <class 'pandas.core.frame.DataFrame'> RangeIndex: 18268 entries, 0 to 18267 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 user_id 18268 non-null object 1 group 18268 non-null object 2 ab_test 18268 non-null object dtypes: object(3) memory usage: 428.3+ KB
None
------------- Gaps ------------ user_id - None group - None ab_test - None ------------- Duplicates ------------ No Duplicates
events.query('details == details')['event_name'].unique()
array(['purchase'], dtype=object)
All names are correct.
There are no duplicates in the data.
There are omissions only in events['details'] - 377577. The details field is filled in only for the "purchase" event.
data = events.merge(participants, on = 'user_id', how='left' )
data = data.query('ab_test == "recommender_system_test"')
data = data.merge(new_users, on = 'user_id', how = 'left')
data['lifetime'] = (data['event_dt'] - data['first_date']).dt.days
defectors = participants.groupby('user_id').agg({'ab_test':'nunique'}).query('ab_test == 2')
data = data.query('user_id not in @defectors')
data.sample(5)
| user_id | event_dt | event_name | details | group | ab_test | first_date | region | device | lifetime | |
|---|---|---|---|---|---|---|---|---|---|---|
| 12672 | C573C24EF423143C | 2020-12-25 22:24:16 | product_page | NaN | A | recommender_system_test | 2020-12-15 | EU | iPhone | 10 |
| 8728 | 8EA216D0FDB06620 | 2020-12-16 08:49:13 | product_page | NaN | A | recommender_system_test | 2020-12-16 | EU | Android | 0 |
| 18175 | 1B72B5CD89B93B68 | 2020-12-18 02:15:41 | login | NaN | A | recommender_system_test | 2020-12-14 | EU | PC | 4 |
| 24120 | A9DABAEA2233576C | 2020-12-27 17:14:50 | login | NaN | A | recommender_system_test | 2020-12-19 | EU | Android | 8 |
| 8147 | FC579467441F9567 | 2020-12-15 00:26:04 | product_page | NaN | B | recommender_system_test | 2020-12-07 | EU | PC | 8 |
display( 'Groups: A — control, B — new payment funnel' ,participants['group'].unique())
'Groups: A — control, B — new payment funnel'
array(['A', 'B'], dtype=object)
events['date'] = events['event_dt'].dt.date
display('Launch date: 2020-12-07', events['date'].min().strftime('%Y-%m-%d'))
'Launch date: 2020-12-07'
'2020-12-07'
display('Date when the new users recruitment stop: 2020-12-21', new_users['first_date'].dt.date.max().strftime('%Y-%m-%d'))
'Date when the new users recruitment stop: 2020-12-21'
'2020-12-23'
display('Stop date: 2021-01-04', events['date'].max().strftime('%Y-%m-%d'))
'Stop date: 2021-01-04'
'2020-12-30'
total_users = data['user_id'].nunique()
display('Test participants distribution by region:')
for i in data['region'].unique():
print(f' Region {i} : {data[data["region"] == i]["user_id"].nunique()/total_users:.2%}' )
'Test participants distribution by region:'
Region EU : 94.72% Region N.America : 3.24% Region CIS : 0.82% Region APAC : 1.22%
display('EU region users who got into the test >15%:', "{:.2%}".format(data[data["region"] == "EU"]["user_id"].nunique()/new_users[new_users["region"] == "EU"]["user_id"].nunique()))
'EU region users who got into the test >15%:'
'7.52%'
display('Expected number of test participants: 6000', data['user_id'].nunique())
'Expected number of test participants: 6000'
3675
Corresponds Corresponds Does not match 2020-12-23 Does not match 2020-12-30 Does not match 8% Does not match 3,675test_start = pd.Timestamp(events['date'].min())
test_finish = pd.Timestamp(events['date'].max())
def f(row):
if row['start_dt'] <= test_start <= row['finish_dt']:
return 1
else:
return 0
def g(row):
if row['start_dt'] <= test_finish <= row['finish_dt']:
return 1
else:
return 0
mevents['test_start'] = mevents.apply(f, axis=1)
mevents['test_finish'] = mevents.apply(g, axis=1)
mevents['test_isin'] = mevents['test_start'] + mevents['test_finish']
display( f'Tests overlap with the following holidays:', mevents.query("test_isin > 0"))
'Tests overlap with the following holidays:'
| name | regions | start_dt | finish_dt | test_start | test_finish | test_isin | |
|---|---|---|---|---|---|---|---|
| 0 | Christmas&New Year Promo | EU, N.America | 2020-12-25 | 2021-01-03 | 0 | 1 | 1 |
| 10 | CIS New Year Gift Lottery | CIS | 2020-12-30 | 2021-01-07 | 0 | 1 | 1 |
display(f'Test names: {participants["ab_test"].unique()}' )
"Test names: ['recommender_system_test' 'interface_eu_test']"
display('Number of users participating in two tests simultaneously:')
display(len(participants.groupby('user_id').agg({'ab_test':'nunique'}).query('ab_test == 2')))
display('Number of users participating in the two test groups "recommender_system_test" simultaneously:')
display(len(participants.query('ab_test == "recommender_system_test"').groupby('user_id').agg({'group':'nunique'}).query('group == 2')))
display('Uniformity of distribution across test groups check')
display(participants.query('ab_test == "recommender_system_test"').groupby('group')['user_id'].nunique())
'Number of users participating in two tests simultaneously:'
1602
'Number of users participating in the two test groups "recommender_system_test" simultaneously:'
0
'Uniformity of distribution across test groups check'
group A 3824 B 2877 Name: user_id, dtype: int64
events['event_name'].value_counts()
login 189552 product_page 125563 purchase 62740 product_cart 62462 Name: event_name, dtype: int64
#
fig = make_subplots(rows=1, cols=2)
fig.update_layout(showlegend=False,height=800, width=1000,
title=(f'Product funnel of control group A and test group B'), title_font_size = 20 )
index=['login', 'product_page', 'product_cart', 'purchase']
funnel_a = data.query('lifetime <= 14 and group =="A"')\
.groupby(by='event_name')['user_id'].nunique().sort_values(ascending=False).reindex(index=index).reset_index()
funnel_b = data.query('lifetime <= 14 and group =="B"')\
.groupby(by='event_name')['user_id'].nunique().sort_values(ascending=False).reindex(index=index).reset_index()
fig.add_trace(go.Funnel(
x = funnel_a['user_id'] ,
y = funnel_a['event_name'],
textinfo = "value+percent previous+percent initial",
marker = {"color": ["#EBF63A", "#8DF63A", "#3AF6A3", "#3A8DF6"]}),
row=1, col=1)
fig.add_trace(go.Funnel(
x = funnel_b['user_id'] ,
y = funnel_b['event_name'],
textinfo = "value+percent previous+percent initial",
marker = {"color": ["#EBF63A", "#8DF63A", "#3AF6A3", "#3A8DF6"]}),
row=1, col=2)
fig.show()
Expected effect: in 14 days from the moment of registration, users will show an improvement in each metric by at least 10%:
The conversion to viewing product cards product_page fell by more than 10% in 14 days of lifetime.
As can be seen from the graphs, the conversion in the product_page of control group A is 65%, and in the test group 56%
Views of the basket product_cart increased by 3%: group A - 46%, and group B - 49%
Purchases — purchase fell by 6%: group A - 106%, and group B - 100%
The test does not comply with the specification on the following points:
Test 2020-12-23 Test 2020-12-30 95% Test 3,675 Expected effect: in 14 days from the moment of registration, users will show an improvement in each metric by at least 10%:
Dropped by 11% Increased by 3% Dropped by 6% The tests overlap with the "Christmas&New Year Promo" marketing events in EU, N.America.
The crossover between the two competing tests was 1,602 users.
There were no intersections of users between the groups of the "recommender_system_test" test.
The groups of the "recommender_system_test" test are not equal to each other.
part_rec_a = participants.query('ab_test == "recommender_system_test" and group == "A"')
part_rec_b = participants.query('ab_test == "recommender_system_test" and group == "B"')
ea = events.query('user_id in @part_rec_a.user_id')
eb = events.query('user_id in @part_rec_b.user_id')
events_user_a = ea.groupby('user_id', as_index=False).agg({'event_name':'count'})
events_user_b = eb.groupby('user_id', as_index=False).agg({'event_name':'count'})
display( 'Distribution of the number of events per user in Group A', events_user_a.describe())
display( 'Distribution of the number of events per user in Group B', events_user_b.describe())
'Distribution of the number of events per user in Group A'
| event_name | |
|---|---|
| count | 2747.000000 |
| mean | 7.027303 |
| std | 3.868983 |
| min | 1.000000 |
| 25% | 4.000000 |
| 50% | 6.000000 |
| 75% | 9.000000 |
| max | 24.000000 |
'Distribution of the number of events per user in Group B'
| event_name | |
|---|---|
| count | 928.000000 |
| mean | 5.812500 |
| std | 3.483878 |
| min | 1.000000 |
| 25% | 3.000000 |
| 50% | 5.000000 |
| 75% | 8.000000 |
| max | 28.000000 |
events_user_a
| user_id | event_name | |
|---|---|---|
| 0 | 0010A1C096941592 | 12 |
| 1 | 00341D8401F0F665 | 2 |
| 2 | 003DF44D7589BBD4 | 15 |
| 3 | 00505E15A9D81546 | 5 |
| 4 | 006E3E4E232CE760 | 6 |
| ... | ... | ... |
| 2742 | FF44696E39039D29 | 6 |
| 2743 | FF5A1CD38F5DD996 | 10 |
| 2744 | FF5B24BCE4387F86 | 9 |
| 2745 | FF825C1D791989B5 | 8 |
| 2746 | FFAE9489C76F352B | 6 |
2747 rows × 2 columns
display('Distribution of the number of events per user')
plot = sns.jointplot(
x=events_user_a['event_name'],
y=events_user_b['event_name'],
kind='reg',
height = 10,
palette = ("hls", 8))
plot.set_axis_labels(xlabel='Group A', ylabel='Group B');
'Distribution of the number of events per user'
Null hypothesis
H0: There are no differences in the number of events per user between the groups.
Alternative hypothesis
H1: There are differences in the number of events per user between groups.
print('P-value = ' , "{:.20f}".format(stats.mannwhitneyu(events_user_a["event_name"], events_user_b["event_name"])[1]))
print('Relative differences between groups = {0:.3f}'.format(events_user_b['event_name'].mean()/events_user_a['event_name'].mean()-1))
P-value = 0.00000000000000000037 Relative differences between groups = -0.173
P-value = 0.00000000000000000037 is less than 0.05.
We do not reject the alternative hypothesis that there are statistically significant differences in the number of events per user between groups.
The relative gain of Group A of 17% is statistically significant.
events_date_a = ea.groupby('date', as_index=False).agg({'event_name':'count'})
events_date_b = eb.groupby('date', as_index=False).agg({'event_name':'count'})
display( 'Distribution of the number of events by day in Group A', events_date_a.describe())
display( 'Distribution of the number of events by day in Group B', events_date_b.describe())
'Distribution of the number of events by day in Group A'
| event_name | |
|---|---|
| count | 23.000000 |
| mean | 839.304348 |
| std | 486.445964 |
| min | 328.000000 |
| 25% | 379.500000 |
| 50% | 677.000000 |
| 75% | 1168.000000 |
| max | 2011.000000 |
'Distribution of the number of events by day in Group B'
| event_name | |
|---|---|
| count | 24.000000 |
| mean | 224.750000 |
| std | 108.300407 |
| min | 4.000000 |
| 25% | 146.750000 |
| 50% | 217.500000 |
| 75% | 298.000000 |
| max | 430.000000 |
fig, ax = plt.subplots(figsize=(16,8))
sns.lineplot(data = events_date_a, x = 'date', y='event_name', palette = ("hls", 8), ax=ax, label='Group A');
sns.lineplot(data = events_date_b, x='date', y='event_name', palette = ("hls", 8), label='Group B', ax=ax);
Comparison of funnels shows that the conversion rate in test group B has worsened at all stages.
This may be due to the crossover with the holidays, the incompleteness of the test - the set of users was completed later by 2 days and the test was stopped earlier by 5 days, which ultimately excludes most of the users from consideration.
Null hypothesis
H0: There are no differences in conversion to the target event between the groups.
Alternative hypothesis
H1: There are differences in conversion to the target event between the groups.
def z_test(successes1, successes2, trials1, trials2, alpha=0.05):
p1 = successes1 / trials1
p2 = successes2 / trials2
p_combined = (successes1 + successes2) / (trials1 + trials2)
difference = p1 - p2
z_value = difference / mth.sqrt(p_combined * (1 - p_combined) * (1/trials1 + 1/trials2))
distr = st.norm(0, 1)
p_value = (1 - distr.cdf(abs(z_value))) * 2
print('p-value: ', "%.20f" % p_value )
if (p_value < alpha):
display('Reject the null hypothesis, there are statistically significant differences between the samples')
else:
display('It was not possible to reject the null hypothesis, there are no statistically significant differences in the samples')
for i in index[1:]:
target = 'login'
success = i
print(f'Difference in event conversion to "{i}" between group A and group B')
trials1 = data.query('group =="A" and event_name == @target')['user_id'].nunique()
trials2 = data.query('group =="B" and event_name == @target')['user_id'].nunique()
successes1 = data.query('group =="A" and event_name == @success')['user_id'].nunique()
successes2 = data.query('group =="B" and event_name == @success')['user_id'].nunique()
z_test(successes1, successes2, trials1, trials2, alpha=0.05)
Difference in event conversion to "product_page" between group A and group B p-value: 0.00000431098055475587
'Reject the null hypothesis, there are statistically significant differences between the samples'
Difference in event conversion to "product_cart" between group A and group B p-value: 0.14534814557238195931
'It was not possible to reject the null hypothesis, there are no statistically significant differences in the samples'
Difference in event conversion to "purchase" between group A and group B p-value: 0.01759240266331474345
'Reject the null hypothesis, there are statistically significant differences between the samples'
Comparison of funnels shows that the conversion in test group B has worsened in two stages. This may be due to the intersection with the holidays, the incompleteness of the test - the set of users was completed later by 2 days and the test was stopped earlier by 5 days, which ultimately excludes most of the users from consideration. Also on December 13, there was a sharp jump in the number of events in Group A.
According to the results of the test, there are significant differences between the conversion of groups to "product_page" and "purchase".
There are also significant differences in the number of events per user between groups - the relative gain of group A is 17%.
These results should not be guided, since the test was conducted incorrectly, most of the points of the TOR are not observed.
It is recommended to conduct the test again in compliance with the terms of the TOR, outside of festive promotions.